[SPARK-54452] Fix empty response from SparkConnect server for `spark.sql(...)` inside FlowFunction #53156

SCHJonathan · 2025-11-21T19:10:50Z

What changes were proposed in this pull request?

In PR #53024, we added SDP support for spark.sql(...) inside a FlowFunction. For these calls, instead of eagerly executing the SQL, the Spark Connect server should return the raw logical plan to the client and defer execution to the flow function.

However, in that PR we constructed the response object but forgot to actually return it to the Spark Connect client, so the client received an empty response.

This went unnoticed in tests because, when the client sees an empty spark.sql(...) response, it falls back to creating an empty DataFrame holding the raw logical plan, which happens to match the desired behavior. This PR fixes the bug by returning the proper response instead of relying on that implicit fallback.

Why are the changes needed?

Does this PR introduce any user-facing change?

This PR fixes a bug introduced in #53024 where the server did not return the constructed spark.sql(...) response to the client.

How was this patch tested?

New tests

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun

Hi, @SCHJonathan . FYI, Apache Spark community does not use this kind of tag, '[bugfix]' in the PR title.

SCHJonathan · 2025-11-21T19:24:58Z

fyi @vicennial

dongjoon-hyun · 2025-11-21T20:29:36Z

cc @sryza

dongjoon-hyun

If you don't ind, please file a new JIRA issue as a bug fix instead of reusing SPARK-54020, @SCHJonathan .

...test/scala/org/apache/spark/sql/connect/pipelines/SparkDeclarativePipelinesServerSuite.scala

dongjoon-hyun · 2025-11-21T21:02:10Z

...test/scala/org/apache/spark/sql/connect/pipelines/SparkDeclarativePipelinesServerSuite.scala

+
+  test(
+    "SPARK-54452: spark.sql() outside a pipeline flow function should return a " +
+      "sql_command_result") {


Why do we expect the same behavior? According to the log, it is guarded by insidePipelineFlowFunction condition, isn't it?

spark/sql/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala

Lines 2993 to 2996 in da7389b

if (insidePipelineFlowFunction) {

result.setRelation(relation)

return

}

dongjoon-hyun · 2025-11-21T22:42:10Z

...test/scala/org/apache/spark/sql/connect/pipelines/SparkDeclarativePipelinesServerSuite.scala

+          proto.Command
+            .newBuilder()
+            .setSqlCommand(
+              proto.SqlCommand


May I ask why we use different setSqlCommand in two test cases? Can we use the following like the other test case in the same way?

.setSqlCommand( proto.SqlCommand .newBuilder() .setInput(proto.Relation .newBuilder() .setSql(proto.SQL.newBuilder().setQuery("SELECT * FROM RANGE(5)")) .build()) .build()) .build())

dongjoon-hyun · 2025-11-23T00:04:39Z

Gentle ping, @SCHJonathan .

…sql(...)` inside FlowFunction ### What changes were proposed in this pull request? In PR #53024, we added SDP support for `spark.sql(...)` inside a FlowFunction. For these calls, instead of eagerly executing the SQL, the Spark Connect server should return the raw logical plan to the client and defer execution to the flow function. However, in that PR we constructed the response object but forgot to actually return it to the Spark Connect client, so the client received an empty response. This went unnoticed in tests because, when the client sees an empty `spark.sql(...)` response, [it falls back to creating an empty DataFrame holding the raw logical plan](https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/session.py#L829-L835), which happens to match the desired behavior. This PR fixes the bug by returning the proper response instead of relying on that implicit fallback. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? This PR fixes a bug introduced in #53024 where the server did not return the constructed spark.sql(...) response to the client. ### How was this patch tested? New tests ### Was this patch authored or co-authored using generative AI tooling? No Closes #53156 from SCHJonathan/jonathan-chang_data/fix-spark-sql-bug. Authored-by: Yuheng Chang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 997525c) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2025-11-23T22:14:57Z

Merged to master/4.1.

…sql(...)` inside FlowFunction ### What changes were proposed in this pull request? In PR apache#53024, we added SDP support for `spark.sql(...)` inside a FlowFunction. For these calls, instead of eagerly executing the SQL, the Spark Connect server should return the raw logical plan to the client and defer execution to the flow function. However, in that PR we constructed the response object but forgot to actually return it to the Spark Connect client, so the client received an empty response. This went unnoticed in tests because, when the client sees an empty `spark.sql(...)` response, [it falls back to creating an empty DataFrame holding the raw logical plan](https://github.com/apache/spark/blob/master/python/pyspark/sql/connect/session.py#L829-L835), which happens to match the desired behavior. This PR fixes the bug by returning the proper response instead of relying on that implicit fallback. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? This PR fixes a bug introduced in apache#53024 where the server did not return the constructed spark.sql(...) response to the client. ### How was this patch tested? New tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#53156 from SCHJonathan/jonathan-chang_data/fix-spark-sql-bug. Authored-by: Yuheng Chang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

fix

181e053

github-actions bot added SQL CONNECT labels Nov 21, 2025

dongjoon-hyun reviewed Nov 21, 2025

View reviewed changes

SCHJonathan changed the title ~~[SPARK-54020][bugfix] empty response from SparkConnect server for spark.sql(...) inside FlowFunction~~ [SPARK-54020]empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025

SCHJonathan changed the title ~~[SPARK-54020]empty response from SparkConnect server for spark.sql(...) inside FlowFunction~~ [SPARK-54020] empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025

add tests

5ab76bc

fix tests

a629e97

dongjoon-hyun reviewed Nov 21, 2025

View reviewed changes

...test/scala/org/apache/spark/sql/connect/pipelines/SparkDeclarativePipelinesServerSuite.scala Outdated Show resolved Hide resolved

SCHJonathan changed the title ~~[SPARK-54020] empty response from SparkConnect server for spark.sql(...) inside FlowFunction~~ [SPARK-54452] empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025

address comment

048057c

dongjoon-hyun reviewed Nov 21, 2025

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-54452] empty response from SparkConnect server for spark.sql(...) inside FlowFunction~~ [SPARK-54452] Fix empty response from SparkConnect server for spark.sql(...) inside FlowFunction Nov 21, 2025

dongjoon-hyun reviewed Nov 21, 2025

View reviewed changes

dongjoon-hyun approved these changes Nov 23, 2025

View reviewed changes

dongjoon-hyun closed this in 997525c Nov 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54452] Fix empty response from SparkConnect server for `spark.sql(...)` inside FlowFunction #53156

[SPARK-54452] Fix empty response from SparkConnect server for `spark.sql(...)` inside FlowFunction #53156

Uh oh!

SCHJonathan commented Nov 21, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

SCHJonathan commented Nov 21, 2025

Uh oh!

dongjoon-hyun commented Nov 21, 2025

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Uh oh!

dongjoon-hyun Nov 21, 2025

Uh oh!

dongjoon-hyun Nov 21, 2025

Uh oh!

dongjoon-hyun commented Nov 23, 2025

Uh oh!

dongjoon-hyun commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (insidePipelineFlowFunction) {
	result.setRelation(relation)
	return
	}

[SPARK-54452] Fix empty response from SparkConnect server for spark.sql(...) inside FlowFunction #53156

[SPARK-54452] Fix empty response from SparkConnect server for spark.sql(...) inside FlowFunction #53156

Uh oh!

Conversation

SCHJonathan commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

SCHJonathan commented Nov 21, 2025

Uh oh!

dongjoon-hyun commented Nov 21, 2025

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dongjoon-hyun Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Nov 23, 2025

Uh oh!

dongjoon-hyun commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54452] Fix empty response from SparkConnect server for `spark.sql(...)` inside FlowFunction #53156

[SPARK-54452] Fix empty response from SparkConnect server for `spark.sql(...)` inside FlowFunction #53156

SCHJonathan commented Nov 21, 2025 •

edited

Loading